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Abstract 

This paper addresses the prediction of functional time series. Existing contributions to this 
problem have largely focused on the special case of first-order functional autoregressive processes 
because of their technical tractability and the current lack of advanced functional time series 
methodology. While the linear prediction equations for any stationary functional time series can be 
stated explicitly, it seems in most situations infeasible to solve them in practice. Using functional 
principal components analysis, it is shown here how standard multivariate prediction techniques 
can be utilized instead. The connection between functional and multivariate predictions is made 
precise for the important case of vector and functional autoregressions. The proposed method is 
easy to implement, making use of existing statistical software packages, and may therefore be at- 
tractive to a broader, possibly non-academic, audience. Its practical applicability is demonstrated 
in a simulation study and an application to environmental data, namely the prediction of daily 
pollution curves describing the concentration of particulate matter in ambient air. It is found 
that the proposed prediction method, if based on the multivariate innovations algorithm, often 
outperforms the standard functional prediction technique. 

Keywords: Dimension reduction; Forecasting, Functional autoregressions; Functional principal 
components, Functional time series; Particulate matter, Vector autoregressions 
MSC 2010: Primary 62M10, 62M20; Secondary 62P12, 60G25 

1 Introduction 

Functional data are often collected in sequential form. The common situation is a continuous-time 

record that can be separated into natural consecutive time intervals, such as days, for which a 
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reasonably similar behavior is expected. Typical examples include the daily price and return curves 
of financial transactions data and the daily patterns of geophysical, meteorological and environmental 
data. The resulting functions may be described by a time series (Y& : k € Z) , each term in the sequence 
being a (random) function Y)-(t) defined for t taking values in some interval [a, b]. Here, Z denotes the 
set of integers. The object (Yfc : k G Z) will be referred to as a functional time series (see Hormann 
and Kokoszka [15] for a recent survey on time series aspects, and Ferraty and Vieu [13] and Ramsay 
and Silverman [19] for general introductions to functional data analysis). Interest for this paper is 
in the functional modeling of concentration of particulate matter with an aerodynamic diameter of 
less than 10/j,m in ambient air, measured half-hourly in Graz, Austria. It is widely accepted that 
exposure to high concentrations can cause respiratory and related health problems. Local policy 
makers therefore monitor these pollutants closely. The prediction of concentration levels is then a 
particularly important tool for judging whether measures, such as partial traffic regulation, have to 
be implemented in order to meet standards set by the European Union. 

Providing reliable guesses for future realizations is in fact one of the most important goals of any 
time series analysis. In the univariate and multivariate framework, this is often achieved by setting up 
general prediction equations that can be solved recursively by methods such as the Durbin-Levinson 
and innovations algorithms (see, for example, 0EO]). Prediction equations may be derived explicitly 
also for general stationary functional time series (see Section 1.6 of the monograph Bosq [8]) but they 
seem difficult to solve and implement. As a consequence, much of the research in the area has focused 
on the first-order functional autoregressive model, shortly FAR(l). Bosq [8] has derived one-step 
ahead predictors that are based on a functional form of the Yule- Walker equations. Besse et al. [7] 
have proposed nonparametric kernel predictors and illustrated their methodology by forecasting 
climatological cycles caused by the El Nino phenomenon. While this paper, and also Besse and 
Cardot [6], have adapted classical spline smoothing techniques, Antoniadis and Sapatinas |5], see also 
Antoniadis et al. 013], have studied FAR(l) curve prediction based on linear wavelet methods. Kargin 
and Onatski [16j have introduced the predictive factor method, which seeks to replace functional 
principal components with directions most relevant for predictions. Diderickson et al. |12j have 
evaluated several competing prediction models in a comparative simulation study, finding Bosq's |8j 
method to have the best overall performance. Other contributions to the area are Aneiros-Perez et 
al. [I], and Aneiros-Perez and Vieu [2]. 
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In spite of its statistical relevance and its mathematical appeal, functional time series modeling has 
still some unpleasant limitations for the practitioner. First, to date there are not many "ready to use" 
statistical software packages that can be utilized directly for estimation and prediction purposes. The 
only available package that the authors are aware of is the far package of Damon and Guillas for 
the statistical software R. The lack of tailor-made procedures requires manual implementation. This 
may be challenging and therefore restrict use of the methodology to an academic audience. Second, 
the methodology developed for the FAR(l) case is difficult to generalize. What can be done if an 
FAR(l) approach is infeasible? In addition, how can exogenous predictors and further lags beyond 
the first be incorporated? For these cases functional theory and estimation becomes even more 
complex. Research addressing these questions is scarce, Damon and Guillas [10] being an exception. 
These authors include exogenous covariates into an autoregressive framework for functional ozone 
predictions. 

The goal of this paper is then to fill in this gap by promoting a simple alternative prediction 
algorithm which consists of three basic steps, all of which are easy to implement. First, use functional 
principal components analysis, FPCA, to transform the functional time series observations Yi, . . . ,Y n 
into a vector time series of FPCA scores Y±, . . . , Y n of low dimension p, where p is typically no more 
than 4. Second, fit a vector time series to the FPCA scores and obtain the predictor Y n+ i for Y n+ i. 
Third, utilize the Karhunen-Loeve expansion to re-transform Y n+ \ into a curve predictor Y n+ i. The 
first and the third step are simple and can be performed, for example, with the fda package in R. 
The second step may be tackled with standard multivariate time series methodology. This approach 
is developed in detail in Section [2] Two things are worth stressing. One, it will be shown that Bosq's 
[H] classical FAR(l) prediction may be viewed as equivalent to the proposed procedure if a (non- 
standard) first-order vector autoregression, VAR(l), is fit in the second step. Such a relationship 
does not appear evident and will be worked out as part of Section |3j Two, the previous observation 
gives hope for improved predictions because, instead of VAR(l) fitting, a variety of existing tools 
for vector processes can be entertained. This is particularly helpful, since functional time series 
methodology is still in its infancy. The important case of predicting with exogenous covariates is 
presented in Section |4| 

The remainder of the paper contains a supporting simulation study in Section[5]and an application 
of the new prediction methodology to the forecasting of intraday patterns of particulate matter 
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concentrations in Section [6j Section [7] concludes and technical proofs are given in Appendix [A} 

2 Methodology 

In what follows, let (Y& : fc 6 Z) be an arbitrary stationary functional time series. It is assumed 
that the observations Y k are elements of the Hilbert space H = L 2 ([0,1]) equipped with the inner 
product (x,y) = x(t)y(t)dt. Each Yjt is therefore a square integrable function satisfying ||lfc|| 2 = 
f Y£(t)dt < oo. All random functions are defined on some common probability space (Cl,A,P). 
The notation Y S L P H = L p H {yi,A,P) is used to indicate that, for some p > 0, < oo. Any 

Y G L l H possesses then a mean curve fj, = (E[Y(t)} : t G [0, 1]), and any Y £ L 2 H a covariance operator 
C, defined by C{x) = E[(Y — fj,,x)(Y — (j,)]. The operator C is a kernel operator given by 

C(x)(t)= f c{t,s)x(s)ds, c(t,s) = Cov(X(t),X(s)). 
Jo 

As in the multivariate case, C admits the spectral decomposition 

oo 

C(x) = ^2x e (v £ ,x}v e , 
1=1 

where (A^ : I E N) are the eigenvalues (in strictly descending order) and (vi : £ G N) the corresponding 
normalized eigenfunctions, so that C{vg) = \gvi and \\V^\ = 1. Here, N is the set of positive integers. 
The (vf. £ £ N) form an orthonormal basis, ONB, of L 2 ([0,1]). Hence Y allows for the Karhunen- 
Loeve representation Y = X^fci(^ / > v i) v £- The coefficients (Y, vp) in this expansion are called the 
functional principal components, FPCs, of Y. 

Suppose now that we have observed Yi,...,Y n . In practice fi as well as C and its spectral 
decomposition will be unknown and need to be estimated from the sample. To estimate //, let 

1 n 

M) = -YlY k (t), t€[0,l], 
k=l 

be the sample mean function. Theorem 4.1 of Hormann and Kokoszka [H] implies that for a large 
class of stationary sequences -E[||/in — /i|| 2 ] = 0(n _1 ), thereby showing that p, n is -y/n-consistent for 
\i. For this reason estimation of the mean curve can be done in a separate step, and henceforth the 
more convenient assumption £7[Yfe] = 0, the zero function, is adopted. The covariance operator and 
its eigenvalues and eigenfunctions can be estimated using the sample covariance estimator 

1 n 

C n (x) = - }(Y k - fi n ,x)(Y k - p, n ), 

k=l 
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respectively. An application of Theorem 2.1 in Hormann and Kokoszka [M] yields that this estimator 
is -^/n-consistent for C. More precisely, [ 1 1 CVi — C[|^] = 0(n _1 ), where the operator norm || • ||£ is, 
for any operator A, defined by 

\\A\\ C = sup \\A(x)\\. 

\\x\\<l 

From C n , estimated eigenvalues \\ n , • • • , A p>n and estimated eigenfunctions Vi >ri , ■ ■ . , v PjTl can be com- 
puted for an arbitrary fixed, but typically small, p < n. These estimators inherit \/n-consistency 
from C n . For notational convenience, Ag and vg will often be used in place of A^ n and vi n . 

Functional linear prediction equations for the general case have been derived in Section 1.6 of 
the monograph Bosq [8]. They appear to be infeasible in most situations. As pointed out in the 
introduction, the notable exception is the FAR(l) process defined by the stochastic recursion 

Y k - ii = - n) + £k, keZ, (2.1) 

where (e& : k G Z) are centered, independent and identically distributed innovations in L 2 H and \I/ 
a bounded linear operator satisfying \\^!\\c < 1. The latter condition ensures that the recurrence 



equations (2.1) have a strictly stationary and causal solution. Bosq [S] has in the FAR(l) case used 
the prediction equations to devise what is now often referred to as the common predictor. This 
one-step ahead prediction is based on an estimator ^> n of ^ and then given by Y n+ \ = ^ n Y n . Details 
of this method are given in Section [3| where it will be used as a benchmark to compare with the 
novel methodology to be introduced in the following. The new prediction technique avoids estimating 
operators directly and instead utilizes existing multivariate prediction methods. 

The proposed prediction algorithm proceeds in three steps. First, select p, the number of principal 
components to be included in the analysis, for example by ensuring that a certain fraction of the 
data variation is explained. With the sample eigenfunctions, empirical FPC scores y\i = (Yfc, vi) can 
now be computed for each combination of observations Yk, k = l,...,n, and sample eigenfunction 
ve, I = 1, . . . ,p. The superscript e emphasizes that empirical versions are considered. Create from 



the FPC scores the vectors 



Y% — (ytv • • • i vLp) i 



where ' signifies transposition. By nature of FPCA, the vector Y e k contains most of the information 
on the curve Y^. In the second step, fix the prediction lag h. Then, use multivariate prediction 
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techniques to produce the h-step ahead prediction 

^n+h = (Vn+hyli ■ ■ ■ > Vn+h,p) 

given the vectors Yf,...,Y^. Standard methods such as the Durbin-Levinson and innovations 
algorithm can be readily applied, but other options such as exponential smoothing and nonparametric 
prediction algorithms are available as well. In the third and last step, the multivariate predictions 
are re-transformed to functional objects. This conversion is achieved by defining the truncated 
Karhunen-Loeve representation 

Y n+h = Vn + h,l h H + Vn+h,pVp (2.2) 

based on the predicted FPC scores y\ e and the estimated eigenfunctions V£. The resulting Y n+ h 
is then used as the /i-step ahead functional prediction of Y n+ h. The three prediction steps are 
summarized in Algorithm [TJ 

Algorithm 1 Functional Prediction 

1. Fix p. For k = 1, . . . , n, use the data Yx, . . . , Y n to compute the vectors 

Y% = {Uk,li ■ ■ ■ i Vk,p) ) 
containing the first p empirical FPC scores y% t = (Yfc, ve)- 

2. Fix h. Use Y\, . . . , Y e n to determine the /i-step ahead prediction 

^n+h = (Vn+h,li • • • > Vn+h,p) 

for Y e n+h with an appropriate multivariate algorithm. 

3. Use the functional object 

Y n +h = Vn+h,! V\ + ... + Vn+h,p^p 

as h-step ahead prediction for y n+ ^. 



Several remarks are in order. The proposed algorithm is conceptually simple and allows for several 
immediate extensions and improvements as it is not bound by an assumed FAR(l) structure and, 
in fact, any other particular functional time series specification. This is important because there 
is no well developed theory for functional versions of the the well-known linear ARMA time series 
models ubiquitous in univariate and multivariate settings. Moreover, if an FAR(l) structure is indeed 
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imposed on (Yj% : k G Z), then it appears plausible that Y\, • • • , should approximately follow an 
VAR(l) model. This statement will be made precise in Section [3j 

The FAR(l) model should in practice be employed only if it provides a reasonable approximation 
to the unknown underlying dynamics. To allow for more flexible predictions, higher-order FAR 
processes could be studied. There is, however, until now no method available in the literature that 
would aid practitioners in determining the appropriate order of a functional autoregressive process 
and their application for prediction purposes appears therefore to be of little practical use. The 
proposed methodology avoids these difficulties. It can, in fact, be applied to any stationary functional 
time series. For example, by utilizing the multivariate innovations algorithm (see Section 11.4 in |9j) 
in the second step of Algorithm [I] How this is done in the present prediction setting is outlined in 
Algorithm [2] for the case h = 1. 

Algorithm 2 The Innovations Algorithm for Step 2 in Algorithm [T] 

1. Fix m G {1, . . . , n}. The last m observations will be used to compute the predictor. 

2. For k = 0, 1, . . . ,m, compute 



1 



k=l 



where Y 



n 



1 



Y.k=i Y k- 



3. Set 



m 




where 



e 00 = r(o), 



k-l 



e 



m,m—k 



f (n - k) - y~] ® m ^ m -j®joQ' h 



k,k—j 



& 



-1 
fcO ' 



k = 0, . . . , m — 1 



j=0 




j=0 



The recursion is solved in the order 600; ©io; @22 5 ©21, ©20; • • • 



Instead of the innovations algorithm, standard linear prediction equations can be employed. This 
is detailed in a more general setting allowing for the inclusion of covariates in Section [4j The 
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advantage of the innovations algorithm is that it can be updated quickly when new observations 
arrive. Note, however, that updating usually means including further lags into prediction algorithm: 
X n _ m+ i, . . . , X n+ i are used to predict X n+ 2, then X n _ m+ i, . . . , X n+ 2 are used to predict X n+ 3, and 
so on. In order to apply Algorithm 2 this in turn requires estimation of covariances T(k) for increasing 
lag k. Such estimates are less reliable the smaller n and the larger k. Therefore including too many 
lag values has a negative effect in estimation. The simulation study in Section [5] considers cases for 
which m < 4. 

If estimated eigenfunctions and the covariance matrices T(k) are replaced by population analogues, 
then this algorithm gives the best linear prediction (in mean square sense) of the population FPC 
scores based on the last m observations. It will be demonstrated in Sections [U and [6] that the 
innovations algorithm based predictions are best among a number of competitors when the true 
model deviates from an FAR(l). 

It should be emphasized that the numerical implementation of the new prediction methodology 
is convenient in R. For the first step, FPC score matrix (Y\ : . . . : Y e n ) and corresponding empirical 
eigenfunctions can be readily obtained with the fda package. For the second step, forecasting for 
the FPC scores can be done in another routine step using the vars package in case VAR models are 



employed. The obtained quantities can be easily combined for obtaining (2.2). 



3 Predicting first-order functional autoregression 
3.1 The standard predictor 

The FAR(l) is the most often applied functional time series model. It will be used here as a benchmark 
to compare the proposed methodology to. In order to obtain Bosq's [H] predictor, estimation of the 
autoregressive operator \& is briefly discussed. The approach is based on a functional version of the 
Yule- Walker equations. Let then (Y k : k S Z) be an FAR(l) time series for which /i = without loss 



of generality. Applying E[{-, x)Y k _i] to (2.1) for any x G H, leads to the relations 



E[(Y k , x)y fc _i] = (Y fc _i), x)y fc _i] + E[(s k , x)F fc _i] 
= E[(*{Y k -. 1 ),x)Y k - 1 ]. 

Let again C(x) = E[(Yi, x)Yi] be the covariance operator of Y\ and also let D(x) = E[{Y\,x)Yq\ 
be the cross-covariance operator of Yq and Y\. If VP' denotes the adjoint operator of ^, given by 
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the requirement (^(x),y) = (x,^'(y)), the operator equation D(x) = C(^>'(x)) is obtained. This 
formally gives $>(x) = D'C^ix), where D'(x) = E[(Y ,x)Y 1 ]. The operator D' can be estimated 
by D'(x) = J2k=2^Xk-i: x )Yk- A more complicated object is the unbounded operator C _1 . 
Using the spectral decomposition of C n , it can be estimated by C~ 1 (x) = ^2^ =1 Xj 1 (ve,x)v£ for 
an appropriately chosen p. Combining these results with an additional smoothing step, using the 
approximation Yp. X^ =1 (Yfc, gives the estimator 

n V V 
k=2 1=1 l'=l 

for ^(x). The foregoing gives rise to the functional predictor 

Y n+1 = §„(y n ) (3.2) 



for Y n+ \. This is the estimator of Bosq |8j. In the next section, the predictor (3.2) is compared to 
the proposed predictor. 

3.2 Fitting vector autoregression to FPC scores 



The main goals of this section are to show that the one-step predictors Y n +i in (2.2), based on 



fitting VAR(l) models in Step 2 of Algorithm [lj and Y n+ \ in (3.2) are asymptotically equivalent for 



FAR(l) processes, and that the FPC score vectors Yf, . . . , Y e n follow indeed a VAR(l) model, albeit 
a non-standard one. The first statement is justified in the next theorem. 

Theorem 3.1. Assume that a VAR(l) model is fit to Y\, . . . ,Y^ by means of ordinary least squares. 



The resulting predictor (2.2) is asymptotically equivalent to (3.2). More specifically, 



| Y n+ i -Y n+1 1| = Op I -= ) (n-too). 



The proof of Theorem 3.1 is given in the Section |A,l[ where moreover the exact difference between 



the two predictors is detailed. The finite sample performance of Y n+ \ and Y n+ \ were compared in a 
simulation study whose results are reported in Section [5} 

In case of a VAR(l), Step 2. of Algorithm [j] can be performed with least squares. To explicitly 
calculate Y n+1 , apply (-,ve) to both sides of Y^ = \P(Yfe_x) + Ek to obtain 

(Y k ,v e ) = (V(Y k -{),vt) + {e k ,v £ ) 
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t'=i 
v 

ve'),Vi) + 5 ki , (3.3) 



with remainder terms 



given by 



*M = d kj + d kl + 4/ + ( £ k^i) 



1 kl 



p 

4/ = 5Z «**-!> - (Y k -i,ve)){^M,v e ), 

£'=1 

oo 

d S = 53 (Yk-i,vv)(*(vt,),v t ). 
l'=p+l 

Some notation is needed. Set = ((£&, v\), . . . , (£&, and = (it^i, . . • , Uk, p )' where u^g = 
Y^£'> p {Yk-li v i'){'^{ v i')} v £)i an d let Bp G M pxp be the matrix with entry t^/) in the Ah row and 

the f th column, £,£' = !,... ,p. Let moreover /3 = vee(B' p ), Z = (Y' 2 , Y' n )' , E = (e' 2 , . . . , e'J', 
U = (w' 2 , . . . , u' n )', Xk = I p <8> ^ and X = (X[ : . . . : Replacing the eigenfunctions V£ by 

their sample counterparts V£, empirical versions of the above variables are denoted by Y"|, Z e , Xf, 
X e , Bp and /3 p . For a vector x £ MP 2 , the operation mat(x) creates apxp matrix, whose £-th column 
contains the elements vn^ p+ i, . . . , vg p . Define now 6k = {5k,i, ■ ■ ■ , <5fc,p)' to arrive at the equations 

BtYl^ + dk, k = 2,...,n. (3.4) 



The equations in (3.4) formally resemble VAR(l) equations. Notice, however, that it is a nonstandard 
formulation, since the errors 8k are generally not centered and dependent. Furthermore, 8k depends 
in a complex way on Yf,_ 1 , so that the errors are not uncorrelated with past observations. The 
coefficient matrix B p is also random, but fixed for fixed sample size n. In the sequel these effects are 



ignored. Utilizing some matrix algebra, (3.4) can be written as the linear regression 

Z e = X e f3 e p + A, (3.5) 

where A = (8' 2 , ■ . ■ , 8' n )' . The ordinary least squares estimator is then p p = (X e> ' X")' 1 X e ' ' Z e , and 
the prediction equation 

Y n+ \ = BpY^ = (yn+i,ii ■■■■> 2/n+i,p)'> (3-6) 
10 



follows directly, denning = mat( / 9 p ) / . 
3.3 Estimation by GLS 

If the functional sequence (Y^: k £ Z) follows an FAR(l) process, then the errors S k in the VAR(l) 



model (3.4) are correlated. This could be taken into account by applying generalized least squares 



(GLS) estimation instead of ordinary least squares. The GLS estimator is 



(3.7) 



where Q e = Cov(A). The prediction procedure leading to (2.2) can then be modified accordingly. In 
view of the Gauss-Markov theorem, GLS outperforms OLS in the given setting. The main difficulty 



in the implementation of (3.7) is to obtain an adequate estimator for Q e . To achieve this goal, 



information on the errors Sk of model (3.4) needs to be extracted. Since they cannot be directly 
observed, a two-step procedure is proposed. In the first step, the model is estimated by OLS. We 
denote the resulting residuals by 

Sk 



n 



n P r fc-i- 



In the second step, the residuals Sk are used to obtain an estimator for f2 e , which can then be plugged 
into ( |3.7[ ). The estimator we propose then is obtained as follows. Let O be the zero matrix in M. pxp . 
For b € {1, . . . , n - 1}, define V h {b) = O if h > b and 

n—h 



v h (b) = -Y,(Sk+h-s n )(6 k -s n y 



k=l 



where S n = - Ylk=i Set then 





yiP) yiP) . 


■ e\\ 




V-n+1 v -n+2 





The parameter b determines the number of lags which are taken into account for the estimation 
of the cross-covariances. Choosing b small has the advantage that the resulting banded matrix Qf 
is easier to invert. Note that inversion can be a difficult problem as has np rows. It will be 
shown in Appendix A. 2 that the correlation of the model errors Sk decays exponentially fast, thereby 
justifying the choice of small b in the estimation. Furthermore, it is evident that the estimator for the 
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cross-covariances at higher lags are more volatile, as the number of observations which are far enough 
apart becomes small. On the other hand, if b is chosen too small, much of the dependence could 
potentially be ignored. A discussion on related issues may be found, for example, in Section 3.2.1 of 
Liitkepohl [TB] . 

The simulations reported below indicate that the gains in efficiency for GLS are negligible in the 
settings considered. This is arguably due to the fact that possible improvements may be significant 
only for small sample sizes. In this case, however, estimation of fijj is almost infeasible. GLS maybe 
applied if there is some preliminary estimate for f2|, for example obtained from historical data. 

4 Prediction with covariates 

In many practical problems, such as in the particulate matter example presented in Section [6j pre- 
dictions could not only contain lagged values of the functional time series of interest, but also other 
exogenous covariates. These covariates might be scalar, vector-valued and functional. Formally the 
goal is then to obtain a predictor Y n+ h given observations of the curves Y\ , . . . , Y n and a number of 
covariates Xn \..., X n r \ The exogenous variables need not be defined on the same space. (X n ^ 

(2) (3) (2) 

could be scalar, Xn a function and X n could contain lagged values of X n ). The following adap- 
tation of the methodology given in Algorithm [T] is derived under the assumption that (Y& : k G Z) as 
well as the covariates (X$ : n G N) are stationary processes in their respective spaces. The modified 
procedure is summarized in Algorithm [3j 

The first step of Algorithm [3] is expanded compared to Algorithm [T] Step 1(a) performs FPCA on 
the response time series curves Y\ , . . . , Y n , In Step 1 (b) , all functional covariates are first transformed 
via FPCA into empirical FPC score vectors. For each functional covariate, a different number of 
principal components can be selected. Vector-valued and scalar covariates can be used directly. All 
exogenous covariates are finally combined into one vector R n in Step 1(c). 

Details for Step 2 and the one-step ahead prediction case h = 1 could be as follows. Since 
stationarity is assumed for all involved processes, the resulting FPC scores form stationary time 
series. Define hence 

T YY (i) = Cav(Y%,Y%_i), T YR (i) = Cov(Yt,RU), T RR = Cov( J R e fe ,i?|) 
and notice that these matrices are independent of k. Fix again m G {1, . . . , n}. The best linear pre- 
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Algorithm 3 Functional Prediction with Exogenous Covariates 



1. (a) Fix p. For k = 1, . . . , n, use the data Yi,...,Y n to compute the vectors 

y% = (y e k,i,---,y e k,pY, 

containing the first p empirical FPC scores y%o = (Yk, ve)- 

(b) For a functional covariate, fix q. For k = 1, . . . , n, use the data X±, . . . , X n to compute the 
vectors 

X% = ( x fc,i> • • • > x t, P ) i 

containing the first q empirical FPC scores x e ki = (X^, W£). Repeat this step for each functional 
covariate. 

(c) Combine all covariate vectors into one vector R^ = (R^, . . . , R% r )' '■ 

2. Fix h. Use Y\ , . . . , Y e n and R e n to determine the h-step ahead prediction 

^ n+h = {Vn+h,li • • • i Vn+h,p) 

for Y e n+h with an appropriate multivariate algorithm. 

3. Use the functional object 

Y n+h = y e n+hl «! + ■■■ + y e n+h , p v P 
as /i-step ahead prediction for Y n+ h- 



dictor Y n+l of Y^ +1 given the vector variables Y^, ■ ■ ■ , Y^_ m+1 , R^ can be obtained by projecting 
each component y^ + \ i of ^n+i onto sp{y^ ^ 1 < i < p, 1 < j < r, n — to + 1 < < n}. Then 
there exist p x p matrices $>i and a p x r matrix 6, such that 

y^ +1 = ^Y e n + $ 2 n-i + • • • + $™n- m+ i + &K- 

Using the projection theorem, it can be easily shown that the matrices <£>i, . . . , <3? m , are characterized 
by the equations 

T YY (i + 1) = $iTyy (i) + • • • + $ m r K y (i + 1 - m) + erjnr(i), i = 0, . . . , m - 1; 

iYh(i) = ^ir ri? (o) + • • • + $ m r rH (i - m) + er KK . 



Let 



/ r yK (o) 
r YY (-i) 



ryy(l) 

iYr(o) 



ryy(l — m) ryy(2 — m) 

V Fry (0) r Hy (l) 



ryy(m-l) r ri? (0) \ 
T YY (m-2) T YR (-1) 

T YY (0) T YR (l-m 
T RY (m-l) T RR (0) J 
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Assuming that T has full rank, it follows that 

($ l9 * 2 , . . . , $ m , 8) = (Tyy(l), . . . , ryy(m), ^(l))^ 1 . 

The matrices ryy(i), ry^(i) and Tj^r have to be replaced in practice by the corresponding sample 
versions. This explains why predictions should not be made conditional on all data Yi, . . . , Y n . It 
would involve the matrices ryy(n),ryy(n — 1), . . . which cannot be reasonably estimated from the 
sample. 

5 Simulations 

To analyze the finite sample properties of the various prediction methods, a comparative simula- 
tion study was conducted and the results are reported in this section. The set-up consisted of ten 
cubic £>-spline functions v\,. . . ,v\q on the unit interval [0, 1], which together determine the (finite- 
dimensional) space H = sp{ui, . . . , i>io}- Innovations were defined by setting 

10 

e k (t) = J2A k/ v e (t), (5.1) 
l=\ 

where (^4fe,i, • • ■ , ^4.fc,io)' were i.i.d. random vectors with mean zero and independent t4-distributed 
components. The four degrees of freedom were selected in order to keep the simulations relevant 
for the application to pollution concentrations in Section [6] for which underlying Gaussianity is an 
unrealistic assumption. However, simulations using normal instead of £4 distributed errors have lead 
to very similar conclusions and are thus not reported here. The prediction methods were tested on 
on three functional time series, namely 

(a) FAR(l): X k = + e k , 

(b) FAR(2): X k = * 1 {X k _ 1 ) + * 2 (X k _ 2 ) + e k , 

(c) FMA(l): X k = Q(e k ^) + e k . 

To generate the functional autoregressive time series in (a) the starting value X_q = ^l v ii with 

a normal random vector (N±, . . . , Niq)' ~ Af(Q, ho), was utilized. For (b), A_io = Yle=i ^£ v £, with 
(Ni, . . • , Niq)' ~ jV(0, iio), was constructed in a similar fashion. The first ten elements X-g, . . . , Xq 
were used for a burn-in in both cases. 
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Note that an arbitrary element in H has the representation x(t) = cgVt(i) with coefficients 

c = (ex, . . . , cio)'- If VP : — >■ H is a linear operator, then 

10 



£=1 
10 10 



=1 £'=1 



= (ttc)'v, 

where, by a slight abuse of notation, VP is also the matrix whose £'-th row and £-th column is 
(^f(v£), V£') and v = (vx, . . . , t>io)' is the vector of basis functions. The linear operator can thus be 
represented by a 10 x 10 matrix that operates on the coemcients in the basis function representation 
of the curves. For the FAR(l) process (a), \I/ was chosen to take the value 1/8 in the diagonal, 1/2 
in first lower off-diagonal, and 1/8 in the second lower off-diagonal, and zero else. For the FAR(2) 
process (b), Vl^i = diag(l/4, . . . , 1/4), VP 2 was chosen to take the value 1/3 in the diagonal, 1/8 in the 
first and second lower off-diagonals, and zero else. For the FMA(l) process (c), = diag(l, . . . , 1). 

The simulation results are shown in Tables [l}{3} They are based on 5,000 repetitions for each 
setting. For each of the processes in (a), (b) and (c) the sample size n and number of FPC scores 
p considered for the prediction were varied. For all combinations, median (MED Pr ), mean (MSE Pr ) 
and standard deviation (SDp r ) of the squared prediction errors f^[Pr(X n+ i)(t) — X n+ \{t)] 2 dt for the 
5,000 repetitions were computed. Here Pr stands for any of the prediction methods considered. For 
the FAR setting, Tables [l}|3] display MED F a R , MSE F ar and SD F ar. For ease of comparison, results for 
all other methods are reported relative to the forecasts obtained from Bosq's benchmark prediction 



method (3.2), that is 



RMED Pt =^, RMSE Pr = ^, RSD Pr - SD " 



MEDfar MSEfar SD FAR 

It should be noted that an approximate 95% confidence interval for the expected squared prediction 
error E fi[Pr(X n+1 )(t) - X n+1 {t)fdt is 

1 96 

MSE Pr ± — ^SD Pr . (5.2) 
\/5000 

Specifically, one-step predictions Pr(X n+ i) were obtained from Algorithm [T] based on the sample 
X\ , . ■ ■ , X n , in Step 2 using 
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• VAR.ls: fitting a VAR(l) to FPC scores by OLS; 

• VAR.gls: fitting a VAR(l) to FPC scores by GLS and banding parameter 6 = 1, and 

• Inno: employing the innovations algorithm (i.e. linear prediction) with different values of m. 

Reported are also predictions Mean (Pr(X n+ i) = 0) and Naive (Pr(A n+ i) = X n ) which were clearly 
outperformed by the other methods. The procedures based on VAR.ls and VAR.gls performed 
almost identical. The latter method performed poorly for small sample sizes and p = 3, in which 
case estimation of £l e can be problematic, but shows a slight advantage for all sample sizes when 
p = 2. Setting aside prediction of the FAR(l) time series, for which all methods gave roughly 
the same results, the application of the innovations algorithm led to improvements for virtually all 
constellations of n and p under consideration. 

6 Predicting particulate matter concentrations 

In order to demonstrate its practical usefulness, the new methodology has been applied to envi- 
ronmental data on pollution concentrations. The observations are half-hourly measurements of the 
concentration (measured in ngm~ s ) of particulate matter with an aerodynamic diameter of less than 
10/um, abbreviated PM10, in ambient air taken in Graz, Austria from October 1, 2010 until March 
31, 2011. Since epidemiological and toxicological studies have pointed to negative health effects, 
European Union (EU) regulation sets pollution standards for the level of the concentration. Policy 
makers have to ensure compliance with these EU rules and need reliable statistical tools to determine, 
and justify to the public, appropriate measures such as partial traffic regulation (see Stadlober et 
al. [21]). Accurate predictions are therefore paramount for well informed decision making. 

Functional data were obtained as follows. In a first step, very few missing intra-day data points 
were replaced through linear interpolation. A square-root transformation was then applied to the 
data in order to stabilize the variance. A visual inspection of the data revealed several extreme 
outliers around New Years Eve known to be caused by firework activities. The corresponding week 
was removed from the sample. The data was then centered and adjusted for weekly seasonality by 
subtracting from each observation the corresponding weekday average. This is done because PM10 
concentration levels are significantly different for the weekends when traffic volume is much lower. 
In the next step, 48 observations for a given day were combined into vectors and transformed into 
functional data using ten cubic S-spline basis functions and least squares fitting. The f da package 
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available for the statistical software R was applied here. Eventually, 175 daily functional observations, 
say, Y\, . . . , Y175, were obtained, roughly representing one winter season for which pollution levels are 
known to be high. They are displayed in the upper left panel of Figure [6j Shown in this figure are 
also the effect of the first three FPCs on the mean curve. Following Ramsay and Silverman |19j . a 
multiple (using the factor .5) of the £th empirical eigenfunction vg was added to and subtracted from 
the overall estimated mean curve fi to study the effect of large (small) first, second or third FPC. 
Notice that 

Y k ~ A + y% x v\ + y e k2 V2 + Vk3h, k = 1, . . . , 175, 

where yte = (Yk, vg) are the empirical FPC scores. These combine to explain about 89% of variability 
in the data. The upper right panel of Figure [6] indicates that if the first FPC score which 
explains about 72% of the variation, is large (small), then a positive (negative) shift of the mean 
occurs. The second and third FPCs are contrasts, explaining respectively 10% and 7% of variation, 
with the second FPC describing differences in the first and second half of the day and the third FPC 
indicating whether the diurnal peaks are more or less pronounced (see the lower panel of Figure [6| . 

For the comparison of the quality of the various prediction methods, the following was adopted. 
Consecutive functional observations Yk, . . . ,Yk+ n -i for selected values of 1 < n < 175 were chosen 
and used for estimation and prediction. Then, squared prediction errors 

/ [Y n+k (t)-Pr(Y n+k )(t)] 2 dt, k = 1,..., 175 -n=:N, 
Jo 

were computed, where Pr(Yfc +n ) can stand for any of the prediction methods introduced in Section[5j 
noting again that Pr(Y^ +n ) is based solely on observations Yk, ■ ■ ■ ,Yk+ n ~i- From the N resulting 
numbers median (MED Pr ), mean (MSE Pr ) and standard deviation (SD Pr ) were computed. This 
procedure was performed for the values n = 20, 40, 60 and 80. Results are reported in Table |4j 
Confidence intervals for the expected square prediction error E [Y n+ k(t) — Vr(Y n+ k)(t)] 2 dt may 



be obtained analogously to (5.2). 

It can be seen that the prediction methods Mean and Naive are not competitive. The methods 
VAR.ls and FAR give almost identical results for mean squared prediction error, thus corroborating 
the theoretical findings. In accordance with the simulation study the innovations algorithms with 
m = 2 and m = 3 generally provide the best predictions among the methods that do not invoke 
covariates. While the mean of the squared prediction errors is not much different for VAR.ls, FAR 
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CM 




Figure 1: Square-root transformed PM10 observations with fat overall mean curve (upper left panel), 
effect of the first FPC (upper right panel), effect of the second FPC (lower left panel), and effect of 
the third FPC (lower right panel). 
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and Inno and n = 20, 40, differences become apparent for the larger choices n = 60, 80. The findings 
are similar for the medians. Since the first three principal components already describe close to 90% of 
the data variability, the addition of another FPC score (p = 4) did not lead to further improvements 
and prediction results for this case are hence not displayed. 

PM10 concentrations are known to be high at locations suffering from severe temperature in- 
versions such as the basin areas of the Alps. Following Stadlober et al. |21j . temperature difference 
between Graz (350m above sea level) and Kalkleiten (710m above sea level) can be utilized to model 
this phenomenon. Temperature inversion is often seen as a key factor influencing PM10 concen- 
trations because temperatures increasing with sea level result in a sagging exchange of air, thereby 
yielding a higher pollutant load at the lower elevation. This has been impressively captured in a study 
conducted on behalf of the ZAMG Regionalstelle Steiermark, for which balloon probes were used to 
analyze the diffusion of PM10 with respect to local meteorological variables. Detailed explanations 
of the experiment may be found in [IT]. As a graphical illustration, PM10 concentrations at different 
altitude (vertical axis) and times (horizontal axis, between 6.10 am and 8.15 pm) are displayed in the 
left panel of Figure |6j while the right panel shows the corresponding temperature values. The peaks 
at ground level around 9 am and 6 pm coincide with rush hour traffic. When temperature inversion 
begins to weaken, air exchange among different atmospheric layers leads to an almost uniform vertical 
spread of PM10 and later to a decrease of pollution concentration at ground level. 

To illustrate functional prediction with covariates, temperature difference curves of Graz and 
Kalkleiten have been included as a dependent variable. For the overall sample, the first three FPCs 
of the temperature difference curves describe about 95% of the variance. FPCA was used for co- 
variate dimension reduction using q = 3, leading to the inclusion of a three-dimensional exogenous 
regressor (which is almost equivalent to the true regressor curve) in the second step of Algorithm [3j 
Results for the mean squared prediction errors are summarized under the label CTD (covariate tem- 
perature difference) in Table |4j performing the predictions in the same way as above. A significant 
improvement in the mean and median square prediction error can be observed. 
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PM10 Konzentration [Mg/m 3 ]am 17.03.2004 
in Graz-Gries, Firma Roche 



Temperatur [ °C ] am 17.03.2004 
in Graz-Gries, Firma Roche 





Figure 2: PM10 concentration (left) and temperature (right) for March 17, 2004. The vertical axis 
is altitude above ground (0-400m), the horizontal axis is time (6.10am to 8.15pm). 



7 Conclusions 

This paper proposes a new prediction methodology for functional time series that appears to be widely 
and easily applicable. It is based on the idea that dimension reduction with functional principal com- 
ponents analysis should lead to a vector- valued time series of FPC scores that can be predicted with 
any existing multivariate methodology, parametric and nonparametric. The multivariate prediction 
is then transformed to a functional prediction using a truncated Karhunen-Loeve decomposition. 

The proposed methodology seems to be advantageous for several reasons. Among them is its 
intuitive appeal, made rigorous for the predominant FAR(l) case, but also its ease of application as 
existing software packages can be readily used, even by non-experts. It is in particular straightforward 
to extend the procedure to include exogenous covariates into the prediction algorithm. Simulations 
and an application to pollution data suggest that the proposed method leads to predictions that are 
always competitive with and often superior to the benchmark predictions in the field. 

Future research could look into fine-tuning the proposed algorithms and developing automatic 
procedures to select the number of FPC scores p (and q if covariates are considered) as well as the 
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number m used to run the innovations algorithm. 



A Theoretical considerations 
A.l Proof of Theorem 13.11 



Recall the notations introduced above equation (3.4). In order to prove the asymptotic equivalence 



between Y n +i in (2.2) and Y n +i in (3.2) for the case of FAR(l) functional time series, observe first 
that 



1 



n — 1 



-X e 'X l 



where T p is the pxp matrix with entries r p (£, £') = Sfc=i Vk iUk I 1 determined by the FPC scores 



yl i = (Yk,Vi), and ® signifies the Kronecker product. With the help of (3.6), the VAR(l) based 



predictor (2.2) can be written in the form 

1 



Y n+1 



n — 1 



mat ( \I p ®f- l ]X e 'Z e 



with v = ,v p )' being the vector of the first p empirical eigenfunctions. On the other hand, 

defining the pxp matrix T p by the entries T p (£,£') = ^ X)a=i V% iVk I' = diag(Ai, . . . , A p ), direct 



verification shows that (3.2) takes the form 

1 



Y n +i 



n — 1 



if 



mat ( \I p ®t~ 1 ]X e 'Z e 



v. 



The only formal difference between the two predictors under consideration is therefore in the matrices 
T p and T p . Now, for any £,£' = 1, . . . ,p, 



1 1 



r p (£,£') = r p (£,£') + —-Y J ytey e k 



i 



n — 1 



6 G 

UnlUnl 1 



t p (£,£') + 



n — 1 



k=l 

X i I{£ = e'} + yl £ yt e ), 



so that 



In the following || • || will be used with a slight abuse of notation not only to indicate I? norm, but 
also Euclidean norm in W and matrix norm ||^4|| = supii x M =1 ||^4x||, for a square matrix A G ]R pxp . 



Let 



A = mat ([I p (S (f; 1 - f; 1 )] -^jX e, Z e ^j . 
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The orthogonality of the vg together with Pythagoras' theorem and Bessel's inequality imply that 



\\Y n+1 -Y n+1 f = ||A'nir< I|a||-'hf;, 

Define S = maX{-^—rX e 'Z e \ and notice that 
\ n— 1 ' 



|A|| 2 J>. 



e ^ 2 < ||A|| 2 ||yj| 2 . 



IAI 



(r p 1 r p 1 )s\\ < ||f p 1 fp 1 ]]!^! 



Let iu = . . . , itfp)'. Since S(£,£') = ^-j- X)fc=i eVk+i t'-> iterative applications of the Cauchy- 
Schwarz inequality yield 

p / p , n— 1 \ 2 



|5|| 2 = sup fE^^fl^ 

ll^ll=i^ = i V =1 fc= i 

P P / . n-1 

£=1 i'=l v fc=l 
p p n 1 " 

i n 



= Op(1). 

It remains to estimate ||r~ 1 — T~ 1 1| . The next step consists of using the fact that, for any A, B £ M pxp , 
it holds that (A + S)" 1 = A' 1 - A~ X (I + BA~ 1 )~ 1 BA~ 1 , provided all inverse matrices exist. Now 
choosing A = T p and B = T p — T p , it can be seen that 



f p 1 [/ p + (fp-fp)f; 1 ] 1 (f p -Ap)Ap- 



< llf _1 ll 2 llf — f I 
— II p li ll p p\ 



[i p + (£ p -f p )f } 



-n- 1 



< 



t+2\ 



r — r 



Op 



Putting together all results, the statement of Theorem 3.1 is established 



A. 2 Analysis of theoretical model errors 



The arguments in Section 3.2 show that the errors of model (3.4) have a rather complicated form 



and it is generally infeasible to explicitly determine £l e . The main reason for this is that empirical 
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eigenfunctions were used for the projections. On the other hand, Hormann and Kokoszka [14J have 
noted that the empirical eigenfunctions V£ are -^/n-consistent for the population eigenfunctions v# 
(up to random signs) under mild conditions that are assumed to be satisfied in the following. This 
suggests that for a theoretical analysis one can work with the population eigenfunctions vg instead. 



Using these, (3.4) becomes 



Y k = BpYk-x + rj k , 



(A.l) 



where rj k = e k + u k . Stacking the vectors in (A.l) one obtains in analogy to (3.5) the regression 



Z = X(3 p + E + U. 

Defining = Cov(E + U), the generalized least squares estimator for this problem becomes 

PpiGLS) = (X'9r 1 Xy 1 X'VL' 1 Z. 

Note that the r\ k are the theoretical counterparts of S k and S k . Their second-order structure can, 
however, be computed explicitly. First note that E[r] k ] = 0. Defining Wh = Cov(r] k+h ,rj k ), it holds 
that 







W W x W 2 
W-i W Wi 



Stationarity implies that 

W h = Ele^he^} + Elux+he^} + E[e l+h u' x ] + E[ui +h «i] 
= Y. h + D T h + D h + C h , 

where = ^[ei+^e^], Dh = E[ei + hu[] and Ch = E[ui + hu'^\. By assumption, T,h = O, the p x p 
zero matrix, if h ^ 0. Also, Dh = O for h > 0. Hence 



n 



Co C\ C*2 • • • 

x Go Oi • • • 
u 2 o 1 Oo • • • 



+ 



' S D'_ x D'_ 2 
D-i S 
D-2 £>-i S 



Using the definitions of and u^, the elements of Ch and D_h are for £,£' = 1, . . . ,p computed as 

(A.2) 



oo oo 



C h (£,£')= E ^K^i+h,^)^,^)]^^),^)^^),^), 

4=p+1^2=P+l 

OO 

D^ h (£,£') = E[(Y ,v ei )(s^ h ,v £ )}(^(v h ),ve). 
h=p+i 



(A.3) 
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The following lemma shows that the matrices and C% decrease rapidly to O when h — v oo. This 
indicates that in general relatively small values of b for the banded covariance estimator Clf may be 
chosen. 

Lemma A.l. Let denote the adjoint operator of * . If Eq £ L? h , then there is a constant c which 
depends only on the distribution of £q and *, such that 

\c^n<c\\n h cW{v e )\\wMi 
\D_ h (t,i')\< c \\n h cWMi 

for any h > 1. 

Proof. Using Example 2.1 in Hormann and Kokoszka [14J, one can define for any h > a sequence 
(Y^ h) : k G Z) having the same marginal distribution as the FAR(l) process {Yk '■ k E Z) and satisfying 
that {Xk+m : m — ^) anc ^ C^fc-f : ^ — 0) ar e independent for any fceZ and 

^.(n - n w ) = (^Din - n w n 2 ]) 1/2 < ci Meo)\\n h c- 

This and the Cauchy-Schwarz inequality imply that 

|i?[(y 1+ ,,^)(ri, % )]| = \E[{Y 1+h -Y±%v tl ){Y u v t2 ) 

<^(yfc-y fc (ft) )^(yi) 

<c 2 */ 2 ( £o )||*||£. 



Notice that ^2(^1) depends only on the distribution of £0 and *. Hence subsequently using (A. 2), 
the Cauchy-Schwarz inequality and Parseval's identity, leads to 



00 00 



\c h (i,£')\<c\\n h c E E \(*M>vi)(*M,vt>)\ 

£ 1=p +l £ 2 =p+l 



E <*(^)^> 2 ) E 

/ 00 v 1/2 1 00 



1/2 / \ 1/2 

(*(^ 2 ),^/) 2 



1/2 f ^ 1/2 



The second statement can be proven in a similar way and the proof is complete. □ 
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1.004 


1.002 


0.999 


1.005 


1.005 


1.005 


OU 


Naive 




1.156 


1.164 


1.164 


1.245 


1.229 


1.191 




Mean 




1.438 


1.422 


1.413 


1.526 


1.481 


1.383 




Inno, m = 


1 


1.000 


1.000 


1.000 


1.000 


1.000 


1.001 




Inno, m = 


2 


0.999 


1.002 


1.007 


1.009 


1.009 


1.007 




Inno, m = 


3 


1.008 


1.008 


1.014 


1.024 


1.022 


1.014 




FAR 


1.280 


1.560 


1.262 


1.187 


1.437 


1.086 




VAR.ls 




1.000 


1.000 


0.995 


1.000 


1.001 


1.001 




VAR.gls 




0.999 


1.000 


0.995 


1.008 


1.003 


1.003 




Naive 




1.201 


1.176 


1.128 


1.279 


1.262 


1.193 




Mean 




1.467 


1.414 


1.228 


1.575 


1.500 


1.270 




Inno, m = 


1 


1.000 


1.000 


1.000 


1.001 


1.000 


1.000 




Inno, m = 


2 


1.004 


1.000 


1.002 


1.006 


1.000 


0.992 




Inno, m = 


3 


1.005 


1.003 


1.002 


1.011 


1.005 


0.994 




FAR 


1.268 


1.533 


1.088 


1.149 


1.444 


2.250 




VAR.ls 




1.000 


1.000 


1.001 


r\ r\r\r\ 

0.999 


1.000 


1 C\C\f\ 

1.000 




VAR.gls 




1.000 


1.000 


1.000 


1.004 


1.001 


0.999 


320 


Naive 




1.200 


1.186 


1.196 


1.296 


1.265 


1.054 




Mean 




1.490 


1.421 


1.362 


1.612 


1.529 


1.069 




Inno, m = 


1 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 




Inno, m = 


2 


0.999 


0.999 


1.003 


1.006 


1.000 


0.998 




Inno, m = 


3 


0.999 


1.000 


1.005 


1.008 


1.003 


0.999 



Table 1: MED F a R , MSE F a R and SD F ar obtained from 5,000 repetitions are presented when the data 
generating process follows an FAR(l) model. For the competing methods Pr equal to VAR.ls, 
VAR.gls, Naive, Mean and Inno relative values RMED Pr , RMSE Pr and RSD Pr are presented. 
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p = 2 


p = 3 


1 i 


method 




med 


mean 


SU 


med 


mean 


set 




FAR 


1.347 


1.632 


1.245 


1.348 


1.606 


1.122 




VAR.ls 




1.028 


1.037 


1.061 


1.042 


1.070 


1.136 




VAR.gls 




1.066 


1.468 


27.318 


1.085 


3.458 


177.030 


9fl 


Naive 




1.103 


1.093 


1.073 


1.094 


1.087 


1.061 




Mean 




1.203 


1.194 


1.382 


1.173 


1.160 


1.098 




Inno, m = 


1 


1.001 


0.998 


1.001 


0.993 


0.997 


0.996 




Inno, m = 


2 


0.989 


0.977 


0.987 


0.982 


0.987 


1.006 




Inno, m = 


Q 

o 


0.993 


0.990 


0.993 


1.010 


1.017 


1.024 




FAR 


1.324 


1.613 


1.732 


1.291 


1.586 


1.785 




VAR.ls 




1.009 


1.009 


1.002 


1.020 


1.020 


1.016 




VAR.gls 




1.037 


1.261 


10.989 


1.067 


346.569 


21665.501 




Naive 




1.126 


1.111 


1.065 


1.135 


1.124 


1.043 




Mean 




1.207 


1.192 


0.997 


1.243 


1.212 


1.112 




Inno, m = 


1 


1.000 


0.999 


1.001 


0.999 


0.999 


1.000 




Inno, m = 


2 


0.951 


0.953 


0.964 


0.947 


0.945 


0.994 




Inno, m = 


3 


0.958 


0.961 


0.968 


0.955 


0.963 


1.002 




FAR 


1.304 


1.561 


1.199 


1.280 


1.518 


1.143 




VAR.ls 




1.001 


1.005 


1.009 


1.007 


1.006 


1.012 




VAR.gls 




1.039 


1.071 


2.694 


1.045 


1.129 


7.914 


OU 


Naive 




1.118 


1.121 


1.091 


1.165 


1.161 


1.126 




Mean 




1.218 


1.221 


1.143 


1.253 


1.247 


1.310 




Inno, m = 


1 


0.998 


1.000 


1.000 


0.999 


1.000 


1.000 




Inno, m = 


2 


0.938 


0.938 


0.949 


0.901 


0.913 


0.955 




Inno, m = 


3 


0.941 


0.942 


0.953 


0.912 


0.923 


0.962 




FAR 


1.303 


1.569 


1.168 


1.274 


1.556 


1.607 




VAR.ls 




1.000 


1.001 


1.000 


1.001 


1.001 


1.000 




VAR.gls 




1.035 


1.044 


1.557 


1.032 


1.037 


1.038 


i fin 

1DU 


Naive 




1.128 


1.125 


1.097 


1.144 


1.150 


1.042 




Mean 




1.241 


1.226 


1.200 


1.249 


1.239 


1.070 




Inno, m = 


1 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 




Inno, m = 


2 


0.929 


0.928 


0.956 


0.884 


0.908 


0.971 




Inno, m = 


3 


0.930 


0.931 


0.957 


0.894 


0.914 


0.970 




FAR 


1.300 


1.543 


1.191 


1.246 


1.508 


1.291 




VAR.ls 




1.000 


1.002 


1 A1 1 

1.011 


1.000 


1.001 


1 AA1 

1.001 




VAR.gls 




1.037 


1.035 


1.043 


1.033 


1.036 


1.009 


320 


Naive 




1.132 


1.125 


1.063 


1.189 


1.172 


1.095 




Mean 




1.233 


1.228 


1.299 


1.261 


1.260 


1.177 




Inno, m = 


1 


0.999 


1.000 


1.000 


1.000 


1.000 


1.000 




Inno, m = 


2 


0.918 


0.923 


0.961 


0.870 


0.887 


0.921 




Inno, m = 


3 


0.918 


0.924 


0.963 


0.873 


0.890 


0.924 



Table 2: MED F a R , MSE F a R and SD F ar obtained from 5,000 repetitions are presented when the data 
generating process follows an FAR(2) model. For the competing methods Pr equal to VAR.ls, 
VAR.gls, Naive, Mean and Inno relative values RMED Pr , RMSE Pr and RSD Pr are presented. 
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p = 2 


p = 3 


1 1 
/ L 


method 




med 


mean 


set 


med 


mean 


or! 

su 




FAR 


1.759 


2.102 


1.811 


1.753 


2.049 


1.430 




VAR.ls 




1.017 


1.028 


1.609 


1.039 


1.048 


1.063 




VAR.gls 




1.000 


1.017 


1.746 


1.033 


1.383 


14.873 


9fl 


Naive 




1.035 


1.054 


1.113 


1.035 


1.056 


1.088 




Mean 




1.036 


1.038 


0.853 


1.023 


1.050 


1.133 




Inno, m = 


2 


0.982 


0.989 


0.985 


0.988 


0.996 


1.558 




Inno, m = 


3 


0.979 


0.990 


0.985 


0.987 


0.994 


1.009 




Inno, m = 


4 


0.976 


0.989 


0.999 


0.991 


0.994 


1.006 




FAR 


1.710 


2.050 


1.461 


1.714 


2.027 


1.479 




VAR.ls 




1.008 


1.011 


0.994 


1.014 


1.011 


0.981 




VAR.gls 




1.003 


1.000 


0.981 


1.005 


1.004 


0.989 




Naive 




1.045 


1.054 


1.101 


1.084 


1.092 


1.142 




Mean 




1.065 


1.079 


1.114 


1.061 


1.059 


1.024 




Inno, m = 


2 


0.990 


0.978 


0.964 


0.990 


0.983 


0.972 




Inno, m = 


3 


0.983 


0.974 


0.962 


0.983 


0.983 


0.979 




Inno, m = 


4 


0.977 


0.972 


0.970 


0.983 


0.985 


0.988 




FAR 


1.721 


2.042 


1.509 


1.666 


1.999 


1.694 




VAR.ls 




1.003 


1.005 


1.040 


1.002 


1.016 


1.754 




VAR.gls 




0.999 


1.000 


1.020 


0.998 


1.008 


1.656 


OU 


Naive 




1.050 


1.064 


1.121 


1.077 


1.082 


0.946 




Mean 




1.055 


1.072 


1.160 


1.082 


1.119 


2.161 




Inno, m = 


2 


0.975 


0.979 


0.986 


0.974 


0.972 


0.856 




Inno, m = 


3 


0.971 


0.972 


0.981 


0.965 


0.966 


0.828 




Inno, m = 


4 


0.962 


0.968 


0.976 


0.961 


0.961 


0.817 




FAR 


1.754 


2.058 


1.408 


1.716 


2.006 


1.515 




VAR.ls 




1.001 


1.001 


0.999 


1.001 


1.002 


1.000 




VAR.gls 




0.996 


0.999 


1.002 


0.999 


1.000 


0.996 




Naive 




1.048 


1.063 


1.120 


1.083 


1.100 


1.108 




Mean 




1.056 


1.074 


1.110 


1.066 


1.098 


1.142 




Inno, m = 


2 


0.979 


0.980 


0.983 


0.962 


0.966 


0.971 




Inno, m = 


3 


0.968 


0.970 


0.978 


0.947 


0.954 


0.968 




Inno, m = 


4 


0.961 


0.966 


0.976 


0.933 


0.948 


0.967 




FAR 


1.724 


2.082 


1.773 


1.650 


1.979 


1.414 




VAR.ls 




1.000 


1.001 


1.005 


1.001 


1.009 


1.407 




VAR.gls 




1.003 


1.001 


1.006 


1.001 


1.007 


1.265 


320 


Naive 




1.068 


1.073 


1.110 


1.092 


1.099 


1.112 




Mean 




1.057 


1.067 


1.102 


1.099 


1.117 


1.933 




Inno, m = 


2 


0.975 


0.975 


0.979 


0.970 


0.965 


0.927 




Inno, m = 


3 


0.965 


0.964 


0.970 


0.952 


0.950 


0.912 




Inno, m = 


4 


0.961 


0.960 


0.966 


0.945 


0.940 


0.908 



Table 3: MED F a R , MSE F a R and SD F ar obtained from 5,000 repetitions are presented when the data 
generating process follows an FMA(l) model. For the competing methods Pr equal to VAR.ls, 
VAR.gls, Naive, Mean and Inno relative values RMED Pr , RMSE Pr and RSD Pr are presented. 
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p = 2 


p = 3 


n 


method 




mod 


mean 


sd 


med 


mean 


sd 




FAR 


1.58 


2.43 


2.46 


1.53 


2.36 


2.36 




VAR.ls 




1.64 


2.45 


2.43 


1.58 


2.41 


2.34 




Mean 




1.69 


2.88 


3.28 


1.69 


2.88 


3.28 


Ay) 


Naive 




1.81 


2.57 


2.67 


1.81 


2.57 


2.67 




Inno, m = 2 




1.58 


2.38 


2.49 


1.71 


2.37 


2.29 




Inno, m = 3 




1.55 


2.45 


2.50 


1.78 


2.47 


2.32 




CTD, m = 2, q = 


3 


1.41 


2.37 


2.70 


1.65 


2.57 


2.88 




FAR 


1.51 


2.37 


2.42 


1.40 


2.29 


2.38 




VAR.ls 




1.51 


2.37 


2.40 


1.39 


2.31 


2.40 




Mean 




1.69 


2.87 


3.28 


1.69 


2.87 


3.28 




Naive 




1.73 


2.58 


O TO 

2.78 


1.73 


2.58 


2.78 




Inno, m = 2 




1.38 


2.37 


2.52 


1.43 


2.29 


2.39 




Inno, m = 3 




1.51 


2.38 


2.56 


1.53 


2.27 


2.38 




CTD, m = 2, q = 


3 


1.40 


2.22 


2.32 


1.34 


2.16 


2.25 




FAR 


1.61 


2.24 


2.22 


1.36 


2.03 


2.17 




VAR.ls 




1.63 


2.23 


2.20 


1.32 


2.03 


2.17 




Mean 




1.79 


3.09 


3.48 


1.79 


3.09 


3.48 


60 


Naive 




1.69 


2.63 


2.96 


1.69 


2.63 


2.96 




Inno, m = 2 




1.44 


2.09 


2.09 


1.17 


1.98 


2.06 




Inno, m = 3 




1.41 


2.10 


2.09 


1.42 


2.04 


2.28 




CTD, m = 2, q = 


3 


1.16 


1.94 


1.96 


1.06 


1.95 


2.07 




FAR 


1.59 


2.20 


2.20 


1.30 


1.95 


2.17 




VAR.ls 




1.51 


2.19 


2.21 


1.25 


1.95 


2.22 




Mean 




1.82 


3.32 


3.70 


1.82 


3.32 


3.70 


80 


Naive 




1.91 


2.84 


3.13 


1.91 


2.84 


3.13 




Inno, m = 2 




1.31 


2.01 


2.07 


1.25 


1.91 


2.02 




Inno, m = 3 




1.47 


1.98 


2.02 


1.22 


1.89 


2.09 




CTD, m = 2, q = 


3 


1.26 


1.80 


1.73 


1.21 


1.82 


1.80 



Table 4: MED Pr , MSE Pr and SD Pr are shown for prediction methods Pr equal to FAR, VAR.ls, 
VAR.gls, Mean, Naive, Inno and CDT. 
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